Covidex Supplementary data

Rambaut et al nomenclature

This file contains the supplementary data from the Covidex app. All graphs and some tables are interactive and the reader can explore the data.
First, we present some basic stats from the training and testing datasets.

Classification model Training date Sequences Number of subtypes Number of trees mtry Oob error rate
Rambaut et al nomenclature 2020-07-25 38087 164 1000 350 0.025
Classification model Sequences Number of subtypes Error Multi-class AUC
Rambaut et al nomenclature 7476 164 0.0134 0.9904

The following graph plots probability vs the number of ambiguous bases for each sequence. As expected, the proportion of wrongly classified sequences (red dots) increases with lower probability values. Also we see a trend towards larger proportion of wrongly classified sequences with the number of ambiguous bases.  

In the following table evaluation metrics for each class are presented:


In the next heatmap we show the correlation between the expected classification and the obtained classification by Covidex for each class. Overall we find a high correlation value.
 

The Precision-Recall curve shows the good performance of the method
Â